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Abstract — Variable length codes (VLCs) exhibit de- 
synchronization problems when transmitted over noisy 
channels. Trellis decoding techniques based on Maximum A 
Posteriori (MAP) estimators are often used to minimize the 
error rate on the estimated sequence. If the number of symbols 
and/or bits transmitted are known by the decoder, termination 
constraints can be incorporated in the decoding process. All the 
paths in the trellis which do not lead to a valid sequence length 
are suppressed. This paper presents an analytic method to assess 
the expected error resilience of a VLC when trellis decoding 
with a sequence length constraint is used. The approach is 
based on the computation, for a given code, of the amount of 
information brought by the constraint. It is then shown that 
this quantity as well as the probability that the VLC decoder 
does not re-synchronize in a strict sense, are not significantly 
altered by appropriate trellis states aggregation. This proves 
that the performance obtained by running a length-constrained 
Viterbi decoder on aggregated state models approaches the one 
obtained with the bit/symbol trellis, with a significantly reduced 
complexity. It is then shown that the complexity can be further 
decreased by projecting the state model on two state models of 
reduced size. 



I. Introduction 

VLCs are widely used in compression systems due to their 
high compression efficiency. One drawback of VLCs is their 
high sensitivity to errors. A single bit error may lead to the 
de-synchronization of the decoder. Nevertheless, many VLCs 
exhibit self-synchronization properties. The authors in [1] 
show such properties for some binary Huffman codes. The 
error recovery properties of VLCs have also been studied 
in [2], where a method to compute the so-called expected 
error span E s (i.e. the expected number of source symbols on 
which a single bit error propagates), has been proposed. The 
same quantity has been called mean error propagation length 
(MEPL) in [3]. The authors in [4] consider the variance of the 
error propagation length (VEPL) to assess the resilience of a 
code with hard decoding techniques. In [5], the method of [2] 
is extended to compute the so-called synchronization gain/loss, 
i.e. the probability that the number of symbols in the transmit- 
ted and decoded sequences differ by a given amount AS when 
a single bit error occurs during the transmission. Note that 
various VLC constructions have also been proposed to improve 
the self-synchronization properties of the codes [6], [7], [8]. 
The author in [7] introduces a method to construct prefix-free 
self-synchronizing VLCs called T-codes. The synchronization 
property of these codes is analyzed in [9] in terms of the 
expected synchronization delay. 

VLC soft decoding techniques based on MAP (or MMSE) 
estimators have also been considered to minimize the error 
rates (or distortion) observed on the decoded sequences. The 



approaches essentially differ in the optimization metrics as 
well as in the assumptions made on the source model and on 
the information available at the decoder. These assumptions 
lead to different trellis structures on which the estimation or 
soft-decoding algorithms are run. Two main types of trellises 
are considered to estimate the sequence of emitted symbols 
from the received noisy bitstream: the bit-level trellis proposed 
in [10] and the bit/symbol trellis. The bit-level trellis leads 
to low decoding complexity. However, it does not allow the 
exploitation of extra information, such as the number of 
emitted symbols. It hence suffers from some suboptimality. If 
the knowledge of the number of emitted symbols is available 
at the decoder, the problem is referred to as soft decoding with 
length constraint and is addressed, e.g., in [11][12][13][14]. 
This problem has led to the introduction of the bit/symbol 
trellis in [15]. This trellis can optimally exploit such con- 
straints, leading to optimal performance in terms of error 
resilience. Nevertheless, the number of states of the bit/symbol 
trellis is a quadratic function of the sequence length. The 
corresponding complexity is actually not tractable for typical 
sequence lengths. In order to overcome this complexity hurdle, 
most authors apply suboptimal estimation methods on this 
optimal state model such as sequential decoding [14][16][17]. 

This paper presents a method to assess the error resilience 
of VLCs when trellis decoding with length constraint is used 
at the decoder side. The approach is based on the concept of 
gain polynomials defined on error state diagrams introduced 
in [2] and [5]. The method introduced in [5] to compute the 
synchronization gain/loss is first recalled. This method is then 
extended to the case of a symbol sequence of length L(S) 
being sent over a binary symmetrical channel (BSC) of a 
given crossover probability. The derivation is inspired from 
the matricial method described in [3]. It has been shown in 
[18] [19] that the Markovian property of a source can be easily 
integrated in the source model by expanding the state model by 
a constant factor. We thus restrict the analysis to memoryless 
sources. It is shown that for VLCs, the probability mass func- 
tion (p.m.f.) of the synchronization gain/loss is a key indicator 
of the error resilience of such codes when soft decoding with 
length constraint is applied at the decoder side. The p.m.f. of 
the gain/loss allows the computation of the probability that the 
symbol length of the decoded sequence is equal to L(S), i.e. 
the probability that the decoder resynchronizes in the strict- 
sense (no gain nor loss of symbols during the transmission). 
This quantity is given by P(A5 = 0). The length constraint 
is used to discard all decoded sequences which do not satisfy 
the constraint AS = 0. If P(AS = 0) is high, the number of 
"de-synchronized" sequences which will be discarded will be 
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high. This results in increasing the likelihood of the correct 
sequence, hence in decreasing the decoding error rate. The 
entropy of the p.m.f. of the gain/loss represents the amount of 
information that the length constraint brings to the decoder. 
These two quantities (P(A5 = 0) and H(AS)) are shown 
to better predict the relative decoding performance of VLCs 
when soft decoding with a length constraint is used,than the 
MEPL and VEPL measures (these measures are appropriate 
when hard decoding is used). Note that, in the following, the 
term MEPL will be used to refer to the expectation of the 
error propagation length. 

This analysis is then used in Section [III] to assess the 
performance of MAP decoding on the aggregated state models 
proposed in [20], for jointly typical source/channel realiza- 
tions. The aggregated state model is defined by both the 
internal state of the VLC decoder (i.e., the internal node of the 
VLC codetree) and the remainder of the Euclidean division 
of the symbol clock values by a fixed parameter called T. 
This model aggregates states of the bit/symbol trellis which 
differs by multiple of Tsymbol clock instants. The parameter 
T controls the trade-off between estimation accuracy and 
decoding complexity. The choice of this parameter has indeed 
an impact on the quantity of information brought by the length 
constraint on the corresponding trellis. It is shown that the 
probability that the VLC decoder does not re-synchronize 
in a strict sense, as well as the entropy of the constraint, 
are not significantly altered by aggregating states, provided 
that the aggregation parameter T is greater than or equal 
to a threshold. An upper bound of this threshold is derived 
according to the analysis of Section [TT] This proves that the 
performance obtained by running a length-constrained Viterbi 
decoder on the aggregated trellis closely approaches the perfor- 
mance obtained on the bit/symbol trellis, with a significantly 
reduced complexity. Finally, it is shown in Section [IV] that the 
decoding complexity can be further reduced by considering 
separate estimations on trellises of smaller dimensions, whose 
parameters T\ and T2 are relatively prime. If the two sequence 
estimates are not equal, the decoding on a trellis of parameter 
T\ x T-2 is then computed. The equivalence in terms of 
decoding performance between this approach, referred to as 
combined trellis decoding, and the decoding on a trellis of 
parameter T\ x T2 is proved for the MAP criterion, i.e. for the 
Viterbi algorithm [21]. 

II. Link between VLC synchronization recovery 

PROPERTIES AND SOFT DECODING PERFORMANCE WITH A 
LENGTH CONSTRAINT 

Let S = Si, ■■■St, ■■■S'l(s) be a sequence of L(S) symbols. 
This sequence is encoded with a VLC C, producing a bitstream 
X = Xi, ...Xk, —Xux) °f length £(X). This bitstream is 
modulated using a binary phase shift keying (BPSK) mod- 
ulation and is transmitted over an additive white Gaussian 
noise (AWGN) channel, without any channel protection. The 
channel is characterized by its signal to noise ratio, denoted 
Eb/No and expressed in decibels (dB). Note that we reserve 
block capital letters to represent random variables and small 
letters to represent their corresponding realizations. In this 



paper, the term polynomial refers to expressions of the form 
Sigz a iX l , where x denotes the variable and a, are poly- 
nomial coefficients. Hence, we include in this terminology 
either polynomial series (with an infinite number of non null 
coefficients) or finite length polynomials (such as 3N G 
N I Vn > N, a_i = eij = 0), both with negative powers. 

A. The gain/loss behavior of a variable length code 

A method to compute the so-called expected error span E s 
following a single bit error has been introduced in [2]. This 
method relies on an error state diagram which represents the 
states of the decoder when the encoder is in the root node. 
Hence, the error state diagram includes the internal states of 
the decoder, i.e. the internal nodes of the VLC, plus two states 
which represent the loss of synchronization state n\ and the 
return to synchronization state n s respectively. Therefore, the 
set of states of the diagram is {ni,n ai ,n a2 , ...,n s }, where 
the set {ai, a2, ■■■} represents the set of prefixes of the VLC. 
The state n s of the error state diagram corresponds to a return 
of both encoder and decoder automata to the root node of 
the code tree. However, this state may not correspond to a 
strict sense synchronization. In other words, the number of 
decoded symbols may be different from the number of emitted 
ones. The branches of the error state diagram represent the 
transitions between two states of the decoder when a single 
source symbol has been emitted by the encoder. They are 
labeled by an indeterminate variable z which corresponds to 
the encoding of one source symbol. Hence, the gain along 
each edge is the probability of the transition associated with 
that edge multiplied by z. In that case, the gain on the diagram 
from 111 to n s (i.e. the transfer function between n; and n s ) is 
a polynomial of the variable z such that the coefficient of z l is 
the probability that the considered VLC resynchronizes after 
exactly i source symbols following the bit error. Evaluating the 
derivative of the gain polynomial at 1 provides the expected 
error span E s . 

The branch labeling of the error state diagram has been 
extended in [5] so that the gain polynomial informs about the 
difference, caused by a single bit error, between the number 
of emitted and decoded symbols, after hard decoding of the 
received bitstream. This quantity, denoted AS, is referred to 
as the gain/loss. In order to evaluate the p.m.f. of the random 
variable AS, a new variable y J is introduced in the branch 
labeling of the error state diagram. The exponent j represents 
the number of extra output symbols for each input symbol. 
Hence, the corresponding gain polynomial G(y, z) is function 
of both variables y and z. Evaluating this polynomial at z = 1 
gives a polynomial in y only. For sake of clarity, we simply 
denote this polynomial as 

G(y)=G(y,z)\ z=1 . (1) 

The coefficient of y % in the polynomial G(y) gives the 
probability P(A5 = i) following one bit error. Note that i 
can be negative if the decoded sequence is longer than the 
encoded one. In this section, we focus on the behavior of the 
polynomial G(y). Since the variable z is not necessary, we 
compute directly the state diagram for z = 1. 
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Let H be the transition matrix corresponding to the error 
state diagram. 



H 



/ P(n,|m) F{n ai \m) ■■• F(n s \m) \ 
P(n,|n ai ) F(n ai \n ai ) ••• P(n> ai ) 

V F(m\n s ) F(n ai \n s ) ■■• P(n s |n s ) J 



(2) 



where F(n ai \n aj ) represents the probability to go to state n ai 
from state n a .. Let us call Ay) the element at row i and 

column j of the matrix H . Note that hfj(l) is the probability 
to go from state n ai to state n a . in fc stages, i.e. after the 
encoding of fc source symbols. The top right elements of the 
matrices H and H are respectively denoted h(y) = F(n s \ni) 
and h k (y). The gain polynomial G(y) can then be written as 



G(y) = J2 hk (y)- 



(3) 



k-ei 



Hence, the gain polynomial G(y) is obtained as the top right 
element of the matrix (I — H)^ 1 , where I denotes the identity 
matrix of the same dimensions as H. Note that this property 
holds if (T-H)" 1 exists. 

Let us consider the 5-symbol source and the 16 VLCs used 
in [3] to illustrate these concepts. The probability of this source 
as well as the different codes are reproduced in Table U These 
codes have the same mean description length of 2.2 bits per 
symbols. 

Example 1: Let us consider the code C5. Its state diagram 
is depicted in Fig. Q] The transition matrix derived from the 
previous guidelines, i.e. by setting the variable z to 1 in the 
extended diagram of [5], is given by 
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This leads to G(y) 
also means that 



0.0625J;- 1 

P(A5 = -1) 
P(A5 = 0) 
P(A5 = 1) 



0.8352 + 0.1023?/, which 



= 0.1023 
= 0.8352 
= 0.0625. 



(4) 



B. Extension to the BSC 

Let us recall that AS corresponds to the gain/loss engen- 
dered by a single bit error. We propose here to estimate 
P(A5 = i) for a sequence of L(S) symbols that has been 
sent through a BSC of crossover probability p (equals to the bit 
error rate). Since in this section the VLC decoder is assumed 
to be a classical hard decoder, the analysis is also valid on 
an AWGN channel characterized by its signal to noise ratio 
E b /N by taking p = ierfc (^). 

For a sequence of L(S) symbols, the bitstream length L(X) 
lies in the interval of integers T = {L(S) x l m , . . . , L(S) x 
Im}, where l m and Im respectively denote the lengths of the 
shortest and longest codewords. Let E denote the random 
variable corresponding to the number of errors after the 
hard decoding of the received bitstream Y. For i S Z, the 
probability F(AS = i) is given by 



P(A5 = i) = F ( AS 

eGN 

For e G N, the probability 



E = e)V(E 



(5) 



F(E = e) = ^2F(E 



otherwise F(E 



'(E = e) can be expressed as 
e|L(X) = fc)P(L(X) = k) 



if e < L(S) x I 



M 



0. 



(6) 

(7) 
(8) 



where the quantities P(E = e|L(X) = 
signal to noise ratio and are equal to 



k) only depend on the 



F{E 
F(E 



e|L(X) 
e|L(X) 



fc) 
fc) 



P e (l-P) 2 







if e < fc 
if e > fc. 



(9) 
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For every k £ X, the probability P(L(X) = k) is calculated 
from the source statistics and the code C structure. 

To calculate P(A5 = i) according to Eqn. [5] we now 
need to compute the quantities P(A5 = i \ E = e). For 
that purpose, let us now assume that the decoder has already 
recovered from previous errors when another error occurs. This 
assumption requires that the probability that an error occurs 
when the decoder has not returned to the synchronization state 
n s is very low. The lower the error span is and the higher 
Et, /Nq is, the more accurate this approximation is. Under this 
assumption, the quantity AS is independently impacted by 
multiple errors. 

Let us define 



simulation and averaged over 10 7 channel realizations are 



G e (y) = (G* *G)(y) = J2 a ^y\ 





f V(AS < —A\ 


— n 0000002 




P(AS= -3) 


= 0.0000207 




P(AS= -2) 


= 0.0012587 




P(A5= -1) 


— U.UoUU / / U 


< 


¥{AS = 0) 


= 0.9185508 




P(AS = 1) 


= 0.0296306 




P(A5" = 2) 


= 0.0004578 




¥(AS = 3) 


= 0.0000041 




k P(A5" > 4) 


= 0.0000001 


and also lead to d v = 3. 





(17) 



(10) 



The simulated values of P(AS = i) are close to the estimated 
ones for a large set of Eb/No values, which validates the 
approximation. The pseudo-degrees for rj = 1CP 6 of the codes 
introduced in Table Q] have been computed and are given in 
Table M 



where * denotes the convolution product. Note that the poly- 
nomial Gi = G corresponds to the gain polynomial of 
Eqn. [3] Under the previous assumption, the quantity a i e equals 
P(A5 = i\E = e). With Eqn. [8] the resulting gain polynomial 
for this crossover probability can be expressed as 



G(y)=J^G e (y)F(E = e), 



(11) 



where only the quantity P(E = e) depends on Eb/No. The 
coefficients cji of G verify 



(12) 



tt: 



= P ( AS = l \ E = e M E = e ) ( 13 > 

= P(A5 = i). (14) 

Let rj > be a criterion of negligibility. For a given rj, the 
pseudo-degree d n of the polynomial G is defined as 



d n — min d 



E 



9i < r). 



(15) 



ieZ-{-d,...,d} 



The pseudo-degree d v of a polynomial is the degree beyond 
which the sum of the coefficients of this polynomial are below 
a given threshold 77. 

Example 2: Let us determine the pseudo-degree such that rj = 
1CT 6 for the code C 5 , E b /N Q = 6dB, and L(S) = 100. The 
estimates ofgi obtained from Eqn.\l~4\lead to 



(16) 



Hence, according to the definition of the pseudo-degree in 
Eqn. Q51 d v = 3. The values of ¥(AS = i) obtained by 



( ¥(AS < 


-4) 


= 0.0000002 


¥(AS = 


-3) 


= 0.0000235 


¥(AS = 


-2) 


= 0.0013201 


P(A5 = 


-1) 


= 0.0493389 


P(A5 = 


0) 


= 0.9186664 


P(A5 = 


1) 


= 0.0301524 


P(A5 = 


2) 


= 0.0004930 


P(A5 = 


3) 


= 0.0000053 


, ¥(AS > 


4) 


= 0.0000001 



C. Code selection criteria 

Let us consider a MAP estimation run on the bit/symbol 
trellis, with an additional constraint on the length of the 
decoded sequence. This length constraint is used to discard 
all decoded sequences having a number of symbols which 
differs from the number of transmitted symbols, that is, which 
does not satisfy the constraint AS = 0. On the bit/symbol 
trellis, the decoder has two kinds of information to help the 
estimation: the excess rate of the code and the information 
brougth by the length constraint. The excess rate of a VLC 
(residual redundancy in the encoded bitstream) is given by the 
difference between the mean description length (mdl) of the 
code and the entropy of the source. The information brougth 
by the length constraint on the bit/symbol trellis is given by 
the entropy of the p.m.f. of the gain/loss measure (AS*). For 
the considered set of codes, the excess rate is equal to 0.0781 
bits of information and is the same for all codes of Table Q] 

From the p.m.f. of AS the following two quantities can be 
computed: 

• the probability P(A5 = 0) to have a strict sense resyn- 
chronization 

. the entropy H(AS). 
If the probability P(A5 = 0) is small, the number of "de- 
synchronized" sequences which will be discarded will be 
high, then the probability of detecting and correcting errors 
increases. This results in increasing the likelihood of the 
correct sequence, hence in decreasing the decoding error 
rate. As explained below, the entropy H(AS) measures the 
amount of information brought by the length constraint on the 
bit/symbol trellis. To design performance criteria for VLCs, we 
consider codes having the same mdl so that their performance 
can be fairly compared. Hence, the values P(A5 = 0) and 
H(AS), computed from the p.m.f. of the gain/loss measure, 
are indicators of the performance of a VLC when soft decoding 
with length constraint is applied at the decoder side. Table [TT] 
shows the values of these two quantities for the codes of 
TableHl together with the MEPL and the VEPL of [3]. The cor- 
responding decoding performance in terms of the normalized 
Levenshtein distance (NLD) [22], BER and frame error rate 
(FER) obtained with the bit/symbol trellis, for E b /N = 6dB 
and L(S) = 100, are also given. It can be observed that 
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TABLE II 

Pseudo-degrees d v for rj = 10 -6 , proposed criteria, criteria of [3], and error resilience performance for E b /No = 6db, and 

L{S) = 100. 



Code 


a v 


to/ AC r\\ 

irylAo — UJ 


tl \L\o ) 


IVlEr L, 1 j 1 


\/CTJT Til 
\ C.rL.[j\ 


NL.D 
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Ci 
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0.9185 
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o.oyzob 


34.721 


n nntJ7 r 7 
0.UU8 I I 


A HA1 AO 

0.U0193 
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Ci 
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n nnn k 

u.yuuo 


0.578 


n r i r )7Q 
Z.UZZ ( 


2.003 


O.UuboZ^ 


n An 1 a t 

u.uuiyi 


A IOC/I 1 


C3 
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0.8971 


0.595 


z.ObObl 


2.107 


O.UObzb 


A A A 1 A n 


A O O C O 

O.oobob 


Ci 
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0.8913 


0.608 


4.07byz 


Z7.800 


O.U075y 


A A A 1 TT 

0.U0177 


0.31548 


n_ 
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n Q1 ST 




1 71 no'? 

1 . f ±UZO 


1 onn 
1 . zuu 


U.UUOoO 






c 6 


4 


0.8996 


0.578 


3.54546 


18.854 


0.00758 


0.00182 


0.32368 


c 7 


5 


0.7088 


1.287 


1.55556 


0.370 


0.00619 


0.00154 


0.21849 


Cb 


10 


0.7006 


1.553 


2.34861 


2.045 


0.00646 


0.00134 


0.19543 


C 9 


9 


0.6703 


1.632 


1.95707 


1.025 


0.00571 


0.00123 


0.16739 


C10 


36 


0.6401 


2.267 


6.18182 


36.231 


0.00483 


0.00074 


0.10354 


C11 


8 


0.8797 


0.655 


1.85227 


2.233 


0.00614 


0.00183 


0.32219 


C12 


8 


0.8882 


0.620 


1.71678 


1.506 


0.00617 


0.00187 


0.32951 




8 


0.8860 


0.634 


1.79798 


1.914 


0.00615 


0.00182 


0.32142 




8 


0.8957 


0.599 


2.03104 


2.952 


0.00666 


0.00186 


0.32698 


C15 


8 


0.8941 


0.610 


2.20321 


4.144 


0.00685 


0.00189 


0.33244 


C16 


6 


0.9044 


0.564 


1.98086 


2.615 


0.00672 


0.00193 


0.33829 



TABLE HI 

Pseudo-degrees, proposed criteria and error resilience 
performance for E b /N = 6db, and for L(S) = 500 and 
L{S) = 1000. 



Code 


d-r. 


P(A6" = 0) 


ff(Ai') 


BER 


FER 






L(S) = 500 


C's 


5 


0.67565 


1.39229 


0.002210 


0.90012 


Cr 


9 


0.30597 


2.49437 


0.002200 


0.84090 


C10 


40 


0.13111 


4.38846 


0.001687 


0.64636 






L(S) = 1000 


e 5 


7 


0.49590 


1.91479 


0.002290 


0.99062 


Cr 


14 


0.19019 


3.00963 


0.002275 


0.98422 


C10 


40 


0.03215 


4.97712 


0.001899 


0.92838 



the code C10 gives the largest MEPL and VEPL. Hence, 
one could expect this code to lead to the worst decoding 
performance. However, this conclusion is valid only when hard 
decoding is used. When soft decoding with a length constraint 
is being used, it can be observed that the entropy H(AS) 
better predicts the decoding performance, the code C\o giving 
the best performance in this case in terms of FER, BER and 
NLD. Similarly, the code C5 leads to the worst performance 
in terms of BER and FER. The same observations can be 
made for longer sequences (see Table ITIIl for L(S) = 500 and 
L(S) = 1000). The MEPL and VEPL criteria are well-suited 
for hard decoding. However, the two quantities P(A5 = 0) 
and H(AS) are better suited in the case of soft decoding with 
length contraints. 

Simulations have also been performed with a larger source 
alphabet. The English alphabet together with three Huffman 
codes considered for this source in [2] and [5] has been used. 
This source and the corresponding codes are given in Table II VI 
These three codes have the same mean description length 
(4.1557 bits). Table [V] gives the MEPL and VEPL values, as 
well as the quantities P( AS = 0) and H(AS), for these codes. 
It also gives the FER and BER MAP decoding performance of 



these codes on the bit/symbol trellis. The code Cn is the worst 
code in terms of MEPL and VEPL, but the best according 
to our criteria (P(A5 = 0) and H(AS)). This is confirmed 
by the actual FER and BER performance of this code when 
running the MAP decoder with the length constraint. 

III. State aggregation 

The above analysis is used to assess the conditions for 
optimality of MAP decoding with length constraint on the 
aggregated state model described in [20]. This model keeps 
track of the symbol clock values modulo a parameter T instead 
of the symbol clock values as on the classical bit/symbol 
trellis. The state aggregation leads to a significantly reduced 
decoding complexity, as detailled in Section IIII-BI In this 
section, it is shown that, from d v (the pseudo-degree of 
the polynomial representation of AS), one can derive the 
minimal value of T required to have nearly optimum decoding 
performance (i.e. which closely approaches the performance 
obtained with the bit/symbol trellis). For these values of T, we 
show that the amount of information conveyed by the length 
constraint is not significantly altered by state aggregation. 

A. Optimal state model 

The sequence of transmitted bits can be modeled as a hidden 
markov model with states defined as Xk, where k represents 
the bit clock instants, 1 < k < L(K). Let Nf~ denote the 
random variable corresponding to the internal state of the VLC 
(i.e. the internal node of the VLC codetree) at the bit clock 
instant k. For instance, the possible values of Nk for the code 
Co = {0, 10, 11} are n e and ri\, where n e represents the root 
node of the VLC codetree. In the bit-level trellis [10], the 
decoder state model is defined by the random variable Nk 
only. The internal states of the automaton associated with a 
given VLC are defined by the internal nodes of the codetree, 
as depicted in Fig. Oa for the code Cq. The corresponding 
decoding trellis is given in Fig. [3}a. 
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TABLE IV 

Source and codes for the English alphabet used in this paper. 



ASCII Code 


Symbol probability 


Cn[2] 


Cisp] 


Cio[5] 


A 


0.08833733 


0000 


0100 


0100 


B 


0.01267680 


011111 


111101 


000101 


C 


0.02081665 


urn 


11100 


01100 


D 


0.04376834 


00010 


10110 


01101 


E 


0.14878569 


001 


000 


100 


F 


0.02455297 


11100 


11010 


00011 


G 


0.01521216 


011101 


111011 


001100 


H 


0.05831331 


1000 


1000 


1100 


I 


0.05644515 


1001 


1001 


1111 


J 


0.00080064 


111010101 


111111110 


001110100 


K 


0.00867360 


1110100 


1111110 


0011100 


L 


0.04123298 


00011 


10111 


00100 


M 


0.02361889 


11110 


11011 


OHIO 


N 


0.06498532 


0110 


0111 


0101 


o 


0.07245796 


0100 


0101 


1101 


P 


0.02575393 


10111 


11001 


01111 


Q 


0.00080064 


1110101000 


1111111110 


0011101010 


R 


0.06872164 


0101 


0110 


0000 


S 


0.05537763 


1010 


1010 


1110 


T 


0.09354149 


110 


001 


101 


U 


0.02762209 


10110 


11000 


00101 


V 


0.01160928 


111011 


111110 


001111 


W 


0.01868161 


011100 


111010 


001101 


X 


0.00146784 


11101011 


11111110 


00111011 


Y 


0.01521216 


011110 


111100 


000100 


Z 


0.00053376 


1110101001 


1111111111 


0011101011 



TABLE V 

Proposed criteria, criteria of [3] and decoding performance of the English alphabet codes on the bit/symbol trellis for 

E b /N = 6 DB, AND L(S) = 100. 



Code 


P(AS = 0) H(AS) 


MEPL[3] VEPLp] 


BER FER 


Cn 

Cl8 
C19 


0.7312 1.376 
0.8338 0.861 
0.8433 0.844 


5.456 5.868 
3.863 3.906 
1.915 1.192 


0.002082 0.53768 
0.002094 0.56607 
0.002105 0.56900 



Let us assume that the number of transmitted symbols is 
perfectly known on the decoder side. To use this information 
as a termination constraint in the decoding process, the state 
model must keep track of the symbol clock (that is of the 
number of decoded symbols). The optimal state model is 
defined by the pair of random variables (N k ,T k ) [13] [15], 
where T k denotes the symbol clock instant corresponding to 
the bit clock instant k. Since the trellis corresponding to this 
model is indexed by both the bit and the symbol instants, it 
is often called the bit/symbol trellis. This trellis is depicted in 
Fig-GJ-b for the code Cq. The number of states of this model is 
a quadratic function of the sequence length (equivalently the 
bitstream length). The resulting computational cost is thus not 
tractable for typical values of the sequence length L(S). 

B. Aggregated State model: a brief description 

The aggregated state model proposed in [20] is defined 
by the pair of random variables (N k ,M k ), where M k = T k 



mod T is the remainder of the Euclidean division of T k by 
T. The corresponding realization of M k is denoted m k . Note 
that T = 1 and T = L(S) amounts to considering respectively 
the bit-level trellis and the bit/symbol trellis. The automaton 
and decoding trellis of parameter T = 2 corresponding to this 
state model are depicted for the code Co in Figs. [2}b and |3}c 
respectively. The transitions which terminate in the state n £ , 
that is corresponding to the encoding/decoding of a symbol, 
modify Mk as Mk = M^-i + 1 mod T. Hence, the transition 
probabilities on this automaton are given by 

P{N k = n k ,M k = m fe |iV fe _i = n fe _i,M fe _i = m k -i) = 

P(N k = n k \N k -i = n k -i) if n k ^ n e and 

m k = 

< ¥(N k = n k \N k -! = ftk_i) if n k = n £ and 

m k = m k —i + 1 mod T 
otherwise 

(18) 
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3-b 




12 3 4 



Fig. 3. Trellises for the code Co- a) bit-level trellis (T = 1), b) bit/symbol trellis and c) trellis of parameter T = 2. The termination constraints are also 
depicted by circles(here L(S) is assumed to be odd). 



2-a 




(« £ ,mi) (m,mi) 



Fig. 2. Automata of the state models for a) T = 1, b) T = 2 corresponding 
respectively to the bit-level trellis and the extended trellis with T = 2 (Code 
Co). 



where the probabilities F(Nk = rik\Nk-i = rifc-i) are 
deduced from the source statistics. Note that the transition 
probabilities F(Nk\Nk-i) are the ones used in the bit-level 
trellis. 

The proposed state model keeps track of the symbol clock 
values modulo T during the decoding process. In order to 
exploit this information, the decoder has to know the value 
TO L(X) = L(S) mod T. This information can be used as 
a termination constraint, as depicted in Fig. [3] If this value 
is not given by the syntax elements of the source coding 
system, it has to be transmitted. The transmission cost of 
m i(x) is greater than or equal to log 2 (T) bits. Note that 



the knowledge of this value has a lower cost than the one 
of transmitting the exact number of emitted symbols in the 
bit/symbol trellis. In the following, the quantity m^fx) is 
assumed to be known by the decoder. The estimation is 
performed using the Viterbi algorithm [21], hence minimizing 
the FER. In the sequel, the error resilience will be measured 
according to this criterion. For the estimation, the paths which 
do not satisfy the appropriate boundary constraints, i.e. the 
paths that do not terminate in states of the form (?V7 TO L(X))> 
are discarded. The number of states of the trellis of parameter 
T satisfies 

u T <T x L(X) x T, (19) 

where V represents the number of internal nodes of the code. 
The inequality in Eqn.[l9]results from the fact that some pairs 
(rife, tk) are not reachable according to the code structure. Such 
states are mostly located at the first and last bit clock instants 
of the trellis. However, for some particular codes, some states 
are not reachable all along the trellis. For example, for the set 
of codewords {0, 100, 101, 110, 111}, the states (n e ,2q), q E 
N are not reachable for any bit clock instants. To approximate 
the complexity on a trellis of parameter T, the worst case in 
terms of the number of states is considered, i.e. we assume 
that 

v T ^T x L(X) x r, (20) 

Hence, as the number of states of the bit-level trellis is equal 
to -L(X) x r, the computational cost Dt corresponding to the 
trellis of parameter T can be approximated as 

D T ^Tx D bal , (21) 
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where Dbai denotes the computational cost of the bit-level 
trellis. This computational cost is approximatively linear in 
the sequence length and in T. 

C. Aggregated state model: analysis 

According to the definition of the pseudo-degree (Eqn. [T3J 
of the polynomial G(y), the probability that AS belongs to the 
interval {— d n , . . . , d v } is greater than or equal to 1 — 77. This 
leads to the following property. Property!: A value of T such 

that T = d v , and all the more such that T > d v , ensures that 
the Viterbi algorithm run on the aggregated trellis selects, with 
at least probability 1 — rj, a sequence with the correct number 
of symbols. 

However, this property does not mean that the algorithm will 
offer similar results as the ones on the bit/symbol trellis. To 
analyze the respective performance of both models, the amount 
of information conveyed by the termination constraint in both 
cases must be quantified. These quantities are respectively 
given by the entropies of the random variables AS mod T 
and AS. They depend on the sequence length and E^/Nq, 
which are assumed to be fixed. Here, we show that by 
setting the aggregation parameter T to T = 2d n + 1, the 
information brought by the length constraint on the aggregated 
trellis (if (AS modf 1 ) tends towards the one available on the 
bit/symbol trellis (if (AS)). 

For a trellis of parameter T and following the analysis of 
Section Hl-BI the quantity 



H(AS U 



"( AS modT 



can be computed from the quantities §i as 



9i 



(22) 



(23) 



The entropy of the termination constraint on a trellis of 
parameter T is then given by 



(24) 



ff(ASmodT) = - ]T g?log 2 ~g? 

iG{0,...,T-l} 

>if(AS)+ 9ilog 2 L- (25) 

ig{0,...,T-l} 

When T = 2d n + 1, (l25T l can be re- written as 



if (AS mod + 1) > if (AS) + ^ <?ilog 2 <?i- 

i^{-d, 1 ,....d rl } 

(26) 

Let us now assume that 77 < -. Then the function x <— > 
x\og(x) decreases on the interval [0,77] and since Vi ^ 
{— d v , . . . , d v }, (ji < 77, we have 

X] 9i^og 2 9i > \{i $ {-d v , . . .,d v }],g t > 0} 77log 2 

i^{-d v ,...,d I1 } 

(27) 

where the cardinal > 0}| of the set of possible non- 

zero values of gi is bounded by the bitstream length L (X). 
Together with L (X) > T, this leads to 

\{i i {-d n , d v },gi > 0}| < L(X) - 2 dr, - 1. (28) 




Fig. 4. Entropy of AS modT versus T for the codes C5 , C7 , C9 , C10 and 
C13. 



Hence, for a given 77, we have the following lower and upper 
bounds: 

H(AS) + (L (X)-2d„-l)77log 2 77 < ff(ASmod2d ?? +l) < if (AS). 

(29) 

These bounds mean that for 7/ small enough, hence for T = 
2d v + l sufficiently high, the quantity of information brought 
by the length constraint on the aggregated trellis of parameter 
T tends toward the one available on the bit/symbol trellis. 

Example 3: Let us consider the same parameters as in Exam- 
ple\2\(i.e. code C5, d v = 3). From Eqn.\2^ we deduce that 

if (AS) - if (AS mod 2 d v + 1) < -(100 l M - 5)r)\og 2 {r]) 

< 0.04900 bits (30) 



The convergence of if (AS modT) is depicted in Fig. |4] 
for codes of Table J] In this figure, the arrows represent the 
values of if (AS) for the considered codes. Note that for 
C10, if (AS modT) has not converged towards if (AS) yet 
for T = 10. For the other codes, the limit is reached for 
T < 10. According to Section IH-CI the best codes are those 
with the highest values of if (AS). Such codes require a higher 
value of T to approach the value if (AS) of the entropy of 
the termination constraint on the bit/symbol trellis, since the 
pseudo-degree of these codes is higher. Nevertheless, for the 
considered set of codes, the values of T leading to the same 
performance as on the bit/symbol trellis are always lower than 
L(X). Note that for the code & 3 , ff(ASmod2) = = 
if (AS modi). This means that the decoding performance of 
^code C13 on a trellis of parameter T = 2 is the same as the 
one on the bit/level trellis (T = 1). 

The previous analysis has been validated by simulation, 
for sequences of L(S) = 100 symbols. For each parameter 
set (VLC, E b /N and T), the FER is measured over 10 5 
channel realizations. The performance at different values of 
the parameter T and for the codes C5, C7, C10 and C13 is given 
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TABLE VI 

FER FOR SOFT DECODING (VlTERBI) WITH DIFFERENT VALUES OF THE 
AGGREGATION PARAMETER T. 



E b /N 


3 


4 


5 


6 


7 




Code C 5 


T = 1 




0.99120 


0.92330 


0.70464 


0.38774 


0.14558 


T = 2 




0.98805 


0.90368 


0.66193 


0.34633 


0.12452 


T = 3 




0.98698 


0.89901 


0.65527 


0.34313 


0.12388 


T = 4 




0.98665 


0.89795 


0.65457 


0.34298 


0.12386 


T = 5 




0.98652 


0.89782 


0.65449 


0.34296 




T = 10 




0.98651 


0.89780 


0.65448 






bit/symb.(T = 


100) 


0.98651 


0.89780 


0.65448 


0.34296 


0.12386 




Code C 7 


T = 1 




0.99182 


0.92604 


0.71405 


0.39372 


0.14885 


T = 2 




0.98634 


0.88506 


0.59864 


0.25742 


0.06997 


T = 3 




0.98247 


0.86379 


0.55406 


0.22571 


0.06152 


T = 4 




0.98005 


0.85387 


0.53964 


0.21947 


0.06059 


T = 5 




0.97893 


0.84960 


0.53581 


0.21866 


0.06057 


T = 10 




0.97773 


0.84731 


0.53468 


0.21849 




T = 20 




0.97772 










bit/symb.(T = 


100) 


0.97772 


0.84731 


0.53468 


0.21849 


0.06057 




Code Cio 


T = 1 




0.97993 


0.87316 


0.61783 


0.31353 


0.11390 


T = 2 




0.96917 


0.82122 


0.51758 


0.22232 


0.06832 


T = 3 




0.96092 


0.78516 


0.46126 


0.18023 


0.05207 


T = 4 




0.95331 


0.75512 


0.41127 


0.14437 


0.03718 


T = 5 




0.94755 


0.73502 


0.38403 


0.12851 


0.03226 


T = 10 




0.93238 


0.68744 


0.33174 


0.10496 


0.02631 


T = 20 




0.92801 


0.67825 


0.32560 


0.10354 


0.02610 


T = 30 




0.92791 


0.67811 


0.32558 






bit/symb.(T = 


100) 


0.92791 


0.67811 


0.32558 


0.10354 


0.02610 




Code C13 


T = 1 




0.98973 


0.91752 


0.69351 


0.38031 


0.14431 


T = 2 




0.98973 


0.91752 


0.69351 


0.38031 


0.14431 


T = 3 




0.98369 


0.88547 


0.62816 


0.32182 


0.11644 


T = 4 




0.98552 


0.89259 


0.63858 


0.32711 


0.11762 


T = 5 




0.98286 


0.88356 


0.62642 


0.32142 


0.11638 


T = 10 




0.98286 


0.88356 


0.62642 






T = 20 




0.98277 


0.88348 


0.62638 






bit/symb.(T = 


100) 


0.98277 


0.88348 


0.62638 


0.32142 


0.11638 



in Table [VT] In this table, the best decoding performance 
for each code, at different values of E^/No is written in 
italics. These values correspond to the performance obtained 
on the bit/symbol trellis. Note that these values are obtained 
for a value of T which is considerably lower than L(S). As 
predicted, the trellis of parameter T = 2 does not bring any 
improvement in terms of error resilience for the code C13 
compared to the bit-level trellis. These results validate the 
criteria described in Section ITl-CI to select good codes in terms 
of error resilience. Indeed, according to these criteria and the 
simulation results, the best code among the ones proposed in 
Table U is the code C\q and the worst is the code C5. 

IV. Combined trellis Decoding 
A. Motivation 

In this section, we propose an approach allowing further 
reduction of the decoding complexity without inducing any 
suboptimality in terms of decoding performance. The opti- 
mality of this approach is proved for the FER criterion. This 
approach is motivated by the following equivalence 



L(S) 



mod (Ti X T 2 ) = m 

L(S) mod T\ — m mod Ti 
L(S) mod T 2 = m mod T2, 



satisfied if T\ and T 2 are relatively prime. Note that, if Tl and 
T 2 are not relatively prime, the converse is not satisfied. 

Property 2: Let us assume that T± and T2 are relatively prime 
and that T 3 ^ Ti x T 2 . Let us denote by S\, S 2 and S 3 the 
estimates of S provided by the Viterbi algorithm run on the 
trellises of parameters T\, T 2 and T3 respectively. Then, we 
have 

Si = S 2 => S 3 = Si = S 2 . (32) 

Proof: Let us first emphasize that the probability of a 

sequence, computed by the Viterbi algorithm on a trellis of 
parameter T does not depend on T. Let us assume that if two 
sequences have the same probability, then a subsidiary rule is 
applied to select one sequence amongst the two. For instance, 
the lexicographical order can be chosen as a comparison rule. 
Such a rule ensures that the Viterbi algorithm behavior is 
deterministic. Let 



S T 4 { s '/L(s')modT = L(s)modT} 



(33) 



be the set of sequences satisfying the termination constraint 
for the trellis of parameter T. From Eqn. [31] we deduce that 
if T 3 = Ti x T 2 with Ti and T 2 relatively prime, then 



St 3 — 5^ n St 2 , 



(34) 



hence, 



SnQSn. (35) 

Moreover, since we have assumed that Si = S 2 , we get 

SiG5t 3 . (36) 

The estimate Si provided by the Viterbi algorithm applied 
on the trellis of parameter Ti is then such that 



Si = arg max P(s'|X) 
= arg max P(s'|X) 

s'ESt 3 

= S 3 , 



(37) 
(38) 
(39) 



(31) 



where the subsidiary rule may be used in the selection of the 
maximum. This concludes the proof. ■ 
This property means that if a sequence is selected by the 
trellises of parameters Ti and T 2 , then this sequence is also 
selected by the trellis of parameter T3. 

B. The decoding algorithm 

The purpose of the algorithm described in this section is to 
exploit Property [2] The corresponding approach is referred 
to as combined trellis decoding. The rationale behind this 
approach is to use two trellises of parameters T\ and T 2 instead 
of the trellis of parameter T = Ti x T 2 in order to reduce the 
overall decoding complexity. We will also assume that the 
greatest common divisor (gcd) of Ti and T 2 is 1, i.e. that 
Ti and T 2 are relatively prime. The decoding of a sequence 
proceeds as follows: 

1) The Viterbi algorithm is applied to both trellises Ti and 
T 2 . They respectively provide the estimated sequences 
Si and S 2 . 
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2) If Si = S 2 , the decoded sequence is used as the estimate 
of the emitted sequence. 

3) Else, the Viterbi algorithm is applied to the trellis of 
parameter Ti x T 2 . 

According to Property [2] if the same sequence is selected by 
both trellises 7\ and T 2 , this sequence is also selected by the 
trellis of parameter T\ x T 2 . Hence, the performance of the 
above 3-step decoding algorithm of parameters T\ and T 2 is 
equivalent to the one obtained with a Viterbi decoder operating 
on the trellis of parameter Ti x T%. 

C. Expected computational cost of the proposed algorithm 

First, let us recall that if T = 1, the resulting trellis is 
equivalent to the bit-level trellis. If T is greater than or equal 
to L(S) - + 1 (hence greater than or equal to T(S)), the 
trellis is equivalent to the bit/symbol trellis. The intermediate 
values of T amount to considering trellises whose complexity 
is lower than the one of the bit/symbol trellis (see Section 
IIII-BI ). The expectation D mt d(7i, T 2 ) of the computational cost 
of the proposed decoding scheme is then given by 



AadPi.Ta) = TiAai + T 2 D b!ll + pT x T 2 D 



bal 



(40) 



where p = P(Si ^ S 2 ). In the following, D mtd {T 1 ,T 2 ) will 
be denoted D m td- The proposed method is worthwhile in terms 
of computational cost if Z? mt d < T\ x T 2 x T>b a i, i.e. if 

Ti+T 2 



9 < P = 1 - 



(41) 



Ti x T 2 

Therefore, the benefit of the proposed algorithm depends on 
the probability p that the two estimators return the same 
sequence estimate. The probability p decreases when the 
channel noise and/or the sequence length increases. Fig. [5] 
illustrates the complexity reduction brought by the combined 
trellis decoding algorithm for the same decoding performance. 
For the considered settings, a lower computational cost is 
obtained with this approach as long as Ej, / A*o is greater than 
0.65 dB. 

D. Constrained optimization of trellis parameters T\ and T 2 

Let T c be a targeted decoding performance. According to 
the combined trellis decoding scheme described above, this 
level of performance can be reached using two trellises of 
parameters Ti and T 2 such that Ti x T 2 = T c , Ti and T 2 being 
relatively prime. Without loss of generality, let us assume that 
T 2 = Ti+AT, and T c = Ti x (Ti+AT). Note that parsing the 
set N* x N* with the pairs (Ti , AT) ensures to parse the set of 
attainable constraints. The probability p is a function of T\ and 
T 2 , hence a function of T\ and AT. The computational cost 
D m td of the combined trellis decoding algorithm of parameters 
Ti and Ti + AT is given by 

D mtd (Ti , AT) = p{T\ , AT)D Tc + D b , x {2 Ti + AT) (42) 

The quantity p(Ti,AT) represents the probability that the 
trellises of parameter Ti and T\ + AT do not provide the same 
estimate. This quantity can hence be assumed to increase with 
AT. This assumption may not be satisfied for codes having 
specific synchronisation recovery properties. For example, 



x 



- . V A A A a .* * 










































































2 toll 


ises: T t = 
1 trel 


3, T, = 4 
is T = 12 


— h - - 











E b /N 



Fig. 5. Computational cost of the combined trellis decoding approach versus 
Ei,/No against the computational cost of a single trellis decoding approach 
for parameters (Ti = 3,T2 = 4, T3 = 12). The corresponding cut-off value 
of E^/Nq is also depicted and is obtained for p* = 0.417. 



according to section III, even values of T are not appropriate 
for the code C13. Indeed, for this code, a trellis of parameter 
T = 2q — l.q E N provides better decoding performance 
than a trellis of parameter T = 2q,q E N. The previous 
assumption is not always satisfied for this specific code. Under 
the assumption that p(T\, AT) increases with AT, we deduce 
the following property from Eqn. l42l 

Property 3: Let T c E W and K p C N* x N* be the subset of 
positive integers which are relatively prime. Then 



arg min D mtd 



arg min |T 2 -Ti| 

(43) 



According to that property, the set of pairs (Ti,T 2 ) such 
that T 2 = Ti + 1 is optimum. 



V. Conclusion 

This paper makes the link between re-synchronisation prop- 
erties of VLCs and length-constrained MAP estimation tech- 
niques of these codes. This analysis is also used to assess 
conditions for optimality of state aggregation on the bit/symbol 
trellis widely used for soft decoding of VLC encoded sources. 
Nearly optimal decoding performance can be achieved with 
a reduced decoding complexity with respect to the classical 
bit/symbol trellis. A combined trellis decoding algorithm, 
further reducing the decoding complexity without inducing 
suboptimality, is then proposed. The aggregated trellises can 
easily be coupled with a convolutional code or a turbo-code 
in an iterative structure, as done in [13], and [23]. 
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